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1.0  INTRODUCTION 


The  mission  of  the  Information  Understanding  (lU)  lab  is  to  signifieantly  improve  the  design, 
development,  evaluation,  and  deployment  of  Government  and  commereial  software  applications 
for  the  information  analyst.  The  lU  Branch,  now  the  Situation  Awareness  (SA)  Branch,  looks  to 
evaluate  and  integrate  applications  and  technologies  that  have  emerged  from  numerous 
Government  programs,  as  well  as  concepts  developed  in-house.  The  lU/SA  lab  looks  to  develop 
a  clearly  defined  set  of  evaluation  metrics  that  would  assist  in  truly  achieving  software  and 
information  superiority  and  dominance.  The  lU/SA  lab  acquires  tools  from  the  Top  Sail  and 
Eagle  programs  as  well  as  from  Small  Business  Innovative  Research  (SBIR),  other  Government, 
and  in-house  efforts.  It  performs  evaluations  on  these  tools  to  help  formulate  a  baseline  for 
required  equipment  and  manpower.  Tools  are  classified,  evaluated  for  performance,  integration, 
and  transition.  Another  function  of  the  lU/SA  lab  is  the  development  of  real-world  scenarios  to 
test  and  evaluate  the  software.  Finally,  the  lab  looks  to  create  large  data  sets  to  test  current  and 
future  tools  and  evaluate  their  scalability. 


1.1  INFORMATION  UNDERSTANDING  BACKGROUND 


The  Information  Understanding  Branch  (IFTB)  lU  lab  consisted  of  three  Government  scientists, 
a  number  of  summer  students,  and  one  contractor.  In  late  2003,  this  group  began  to  evaluate 
technological  innovations  and  to  identify  potential  enhancements  and  improvements  to 
knowledge  discovery,  information  extraction,  question-answering,  and  cognition  reasoning  tools. 
This  group  worked  to  identify  new  sources  of  information  to  test  tools,  find  capabilities  and 
functions  within  tools  that  could  be  used  with  other  tools,  and  define  information  flow  among 
tools.  This  group  worked  this  mission  through  2004  when  the  IFTB  became  the  Situation 
Awareness  (SA)  branch  (IFED).  The  mission  of  the  SA  branch  is  to  perform  leading-edge 
research  and  development  of  technologies  to  enable  the  realization  of  computationally  intelligent 
systems  for  predictive  situational  awareness,  complex  reasoning,  and  situation  understanding. 
IFED  provides  an  integrated  suite  of  tools  and  techniques  empowering  the  war  fighter,  from  the 
Commander  down  to  each  combatant,  with  a  comprehensive  knowledge  of  the  battle  space  in 
support  of  their  decision-making  process. 


1.2  KNOWDLEGE  DISCOVERY 


Intelligence  analysts  are  often  asked  to  analyze  large  amounts  of  data  to  find  that  small  vital 
piece  of  the  puzzle  or  a  hotspot  of  information  in  a  huge  dataset.  Hotspots  in  information  are 
loosely  defined  as  a  pattern  of  information  that  does  not  necessarily  fit  into  the  norm  of  patterns 
within  that  data.  When  implementing  knowledge  discovery  techniques  on  large  amounts  of  data, 
an  analyst  is  more  interested  in  finding  a  flow  of  information  through  patterns  of  information 
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models  rather  than  trends  within  the  data.  Information  patterns  or  models  represent  a  warning 
signal  to  the  intelligenee  analyst.  By  identifying  relevant,  understandable  patterns  or  models 
within  the  information,  it  is  possible  to  transform  that  information  into  knowledge. 

One  method  of  finding  these  patterns  of  information  within  data  is  Knowledge  Graph  Matehing. 
The  proeess  of  graph  pattern  matehing  involves  eomparing  graphs  for  similarities.  Analyst  use 
knowledge  diseovery  tools  to  arrive  at  aeeurate  intelligenee  deeisions  by  allowing  the  analyst  to 
review  and  utilize  large  amounts  of  information  in  a  timely  and  orderly  manner.  These  types  of 
tools  help  the  analyst  disregard  information  that  may  be  unrelated  or  outdated.  Knowledge 
diseovery  tools  allow  the  intelligenee  analyst  a  greater  amount  of  time  to  perform  intelligenee 
analysis  on  highly  relevant  information  and  less  time  spent  on  information  retrieval. 


1.3  INFORMATION  EXTRACTION  TECHNOLOGY 


Information  extraetion  is  data  extraetion  and  data  organization,  eombined  with  visualization 
eapabilities  that  make  it  easier  for  intelligenee  analysts  to  find,  extraet,  visualize,  and  otherwise 
exploit  data  of  interest  from  unformatted  sourees  of  text  of  multiple  types  (e.g.,  message  traffie, 
news  wire  feeds,  doeument  databases,  and  open  sourees)  and  aeross  multiple  domains.  This 
struetured  information  ean  then  be  stored  in  a  database  and  used  to  feed  analysis  and 
visualization  tools,  sueh  as  data  browsers  and  link  analysis  displays,  in  order  to  enable  analysts 
to  support  the  war  fighters  and  deeision  makers  with  mission  planning,  eampaign  assessments, 
and  other  analysis  produets  and  serviees. 


1.4  TOOL  EVALUATIONS 


Below  is  a  review  of  some  of  the  tools  evaluated  by  the  lU/SA  lab. 

TMODS  (Terrorist  Modus  Operand!  Discovery  System)  is  developed  by  21®*  Teehnology.  It 
is  an  exaet,  inexaet,  and  partial  graph  matehing  tool.  TMODS  is  an  attributed,  direeted  graph, 
where  the  nodes’  entities  sueh  as  groups,  individuals  or  resourees,  and  the  edges  represent 
relationships  among  the  nodes.  Both  nodes  and  edges  ean  have  attributes  that  deseribe  their 
properties.  The  primary  purpose  of  TMODS  is  to  find  sub-graphs  with  eertain  topologieal  or 
attribute  eharaeteristies,  within  the  input  graph.  These  sub-graphs  are  referred  to  as  patterns. 
The  output  of  TMODS  is  a  list  of  exaet  or  partial  matehes  of  a  target  pattern,  along  with  a 
measure  of  the  eonfidenee  of  the  mateh. 

LAW  (Link  Analysis  Workbench),  developed  by  SRI  International,  is  a  web-aeeessible  tool 
where  analysts  and  maehines  eollaboratively  perform  link  analysis.  They  do  this  by  defining 
hierarehieal  and  temporal  patterns  that  inelude  uneertain  and  qualitative  elements.  They  also  do 
this  by  defining  domain  dependent  and  independent  seareh  strategies  for  pattern  applieation. 
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through  a  graphical  user  interface  (GUI)  that  supports  direct  graphical  browsing  and  editing  of 
patterns,  search  strategies,  and  summaries  and  details  of  resulting  matches. 

Proximity,  developed  by  the  Knowledge  Discovery  Laboratory  (c/o  Professor  David  Jensen, 
Director,  Department  of  Computer  Science,  University  of  Massachusetts,  Amherst, 
Massachusetts),  allows  users  to  easily  understand  and  modify  large  relational  data  sets.  In 
addition,  users  can  build  statistical  relational  models  to  enhance  their  understanding  of  the  data 
and  to  make  predictions  about  new  data. 

POLE  STAR,  developed  by  BAE  Systems,  is  a  fact  collecting  and  argument  construction  tool 
that  enables  an  analyst  to  collect  text  from  websites  displayed  in  Microsoft  Internet  Explorer  and 
Microsoft  Word  documents  in  a  collaborative  environment.  An  analysis  of  the  information 
collected,  as  well  as  additional  metadata  to  describe  source,  reliability,  and  classification  is 
provided  to  the  Eact  Collector  component  for  the  generation  of  intelligence  products. 

ISX  Tool  Kit,  developed  by  ISX,  is  a  combination  of  three  tools  -  (1)  the  Thematic  Argument 
Group  (TAG)  Manager,  (2)  the  Semantic  Navigator,  and  the  (3)  Thematic  Argument  Group 
(TAG)  Eog.  The  TAG  Manager  is  a  Groove-based  application  designed  to  provide  role-based 
information  access  and  collaboration  process  control.  It  provides  a  natural  interface  for 
information  sharing  and  collaboration  process  within  the  Groove  collaboration  environment. 
Individuals  may  choose  the  image  to  use  to  represent  them  within  the  workgroup.  The  Semantic 
Navigator  is  a  Groove-based  application  designed  to  make  information  and  not  “files”  the 
currency  of  analysis,  collaboration,  and  exchange.  The  TAG  Log  is  a  Groove-based  application 
designed  to  capture  workgroup  activity  and  provide  a  temporal  view  of  that  activity. 

HITIQA  is  a  question-answering  system,  driven  by  natural  language  human-computer  dialogue. 
HITIQA  is  sponsored  by  ARDA  (Advanced  Research  Development  Activity)  and  the  State 
University  of  New  York  at  Albany. 

WebTAS,  developed  by  Northrop  Grumman  and  the  Air  Eorce  Research  Laboratory  (AERL),  is 
a  modular  software  toolset  that  supports  fusion  of  large  amounts  of  disparate  data  sets, 
visualization,  project  organization  and  management,  pattern  analysis,  activity  prediction,  and 
various  presentation  aides. 

CADRE  (Continuous  Analysis  and  Discovery  from  Relation  Evidence),  developed  by  BAE 
Systems,  automatically  links  and  matches  data  with  user-specified  threat  patterns,  evaluates 
alternative  hypotheses,  and  automatically  generates  queries  for  more  data. 

TBM  Reasoner  and  CAAT  (Theater  Ballistic  Missile  Reasoner  and  Course  of  Action 
Analysis)  tool,  developed  by  BAE  Systems,  is  a  knowledge-based  decision  aid  that  helps  analyst 
rapidly  identify  time  critical  targets  and  named  areas  of  interest  from  ground  moving  target 
indicator  (GMTI)  data. 


3 


Catalyst,  developed  by  General  Dynamies  Advaneed  Information  Systems,  is  a  modeling 
applieation  for  representing  and  analyzing  problems,  partieularly  those  pertaining  to  potential 
threats  and  adversaries’  eourses  of  aetion.  Catalyst  provides  an  explieit  external  framework  that 
supports  the  analytie  deeision  making  proeess  and  allows  analysts  to  develop  a  problem-  or  task- 
speeifie  organizational  strueture  that  ean  be  used  for  organizing  ineoming  or  diseovered  data.  It 
also  allows  analysts  to  develop  organizational  struetures  in  advanee  or  to  develop  and  refine  their 
struetures  as  data  and  information  beeome  available. 

ITEA  (Intermediate  Text  Extraction),  developed  by  General  Dynamies  is  a  tool  that  exploits 
text  doeuments  and  /messages  (partieularly  the  unstruetured  prose  text  portions  of  the 
doeuments)  to  support  intelligenee  analysts  in  their  job  of  supporting  deeision  makers,  mission 
planners,  and  war  fighters.  This  struetured  information  ean  then  be  stored  in  a  database  and  used 
to  feed  analysis  and  visualization  tools  sueh  as  data  browsers  and  link  analysis  displays.  This 
ean  enable  analysts  to  support  war  fighters  and  deeision  makers  with  mission  planning,  eampaign 
assessments,  and  other  analysis  produets  and  serviees. 

IDP  (Information  Discovery  Portal),  developed  by  JANYA,  allows  an  analyst  to  view 
important  data  from  thousands  of  doeuments  and  integrates  several  seareh  and  browsing 
eapabilities  with  visualization  and  eharting  tools.  IDP  also  has  the  eapability  to  ereate  Cross- 
Doeument  Entity  Profiles.  It  is  built  with  InfoXTraet  teehnology  for  text  extraetion. 

Analyst  Notebook,  developed  by  12  Teehnologies,  provides  powerful  solutions  for 
aeeumulating,  investigating,  analyzing,  and  displaying  eomplex  information  and  relationships. 
The  Analyst  Notebook  is  a  graphieal  software  produet  that  is  designed  to  display  and  analyze 
intelligenee  relating  to  an  investigation.  It  provides  a  wide  range  of  methods  to  support  analysis, 
help  navigation  through  large  networks  of  data,  unravel  eomplex  relationships,  and  diseover 
underlying  intereonneetions  quiekly. 

SEAS  (Structured  Evidential  Argumentation  System)  or  SRI  Early  Alert  System,  developed 
by  SRI  International,  aids  analysts  in  predieting  potential  opportunities  and  erises.  It  is 
implemented  as  a  web  server  that  supports  the  eonstruetion  and  exploitation  of  a  eorporate 
memory  filled  with  analytie  produets,  methods,  and  their  interrelationships,  indexed  by  the 
situations  to  whieh  they  apply.  Objeets  from  this  eorporate  memory  are  viewed  and  edited  using 
a  standard  browser  elient,  with  the  SEAS  server  produeing  temporary  HTME  based  upon  the 
eontents  of  the  SEAS  knowledge  base  that  eonstitutes  eorporate  memory. 


1.5  DEVELOPING  METRICS  FOR  KNOWDLEGE  DISCOVERY  AND  INFORMATION 
EXTRACTION 


The  task  of  defining  metries  for  knowledge  diseovery  and  information  extraetion  tools  ean  be  a 
diffieult  and  ehallenging  proeess.  It  is  a  proeess  that  engineers  within  lETB/IEED  have  struggled 
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with  for  some  time.  When  developing  metrics  for  these  types  of  tools,  the  following  functional 
performances  are  the  basis  for  our  metrics  development: 

0  Accuracy 

0  Efficiency 

0  Stability 

0  Improvement 

0  Error  handling 

Other  factors  to  consider  include  usability,  adaptability,  and  scalability.  Eab  engineers  have 
found  many  problems  when  trying  to  develop  these  metrics.  It  is  hard  to  define  usability  and 
adaptability  in  the  lab  environment.  Experiments  are  modeled  to  match  real  world  intelligence 
missions  and  scenarios  but  it  is  very  difficult  to  model  these  environments  to  provide  an  accurate 
assessment  of  the  tools’  capabilities.  In  addition,  usability  can  be  in  the  eye  of  the  beholder  - 
what  works  for  one  analyst  and  his  or  her  mission  may  not  work  for  another.  Accuracy,  which  is 
vital  to  intelligence  missions,  can  also  cause  problems  when  developing  metrics  to  test  these 
types  of  tools. 

In  the  real  world,  ground  truth  is  often  impossible  to  determine.  Ground  truth  in  the  lab 
environment  can  be  replicated,  but  at  the  cost  of  using  much  smaller  and  less  cluttered  data  sets 
than  will  be  found  in  the  operational  world.  Metrics  need  to  effectively  show  what  the  tool  can 
do.  Often  the  inability  to  replicate  the  real  world  situation  hampers  the  development  of  metrics 
as  it  pertains  to  accuracy. 

Eor  the  other  basic  factors  of  metric  development,  the  problems  faced  in  the  lab  are  not  often 
data  or  scenario  driven,  but  rather  they  are  driven  by  the  immaturity  of  the  tools.  Most  of  the 
tools  in  this  lab  are  prototypes.  The  prototypes  contain  system  bugs  and  other  problems  that 
hinder  the  ability  to  test  efficiency,  stability,  improvement,  and  error  handling.  The  lab 
continues  to  improve  metrics  to  better  understand  and  evaluate  tools  that  will  support  the  Air 
Eorce  in  the  future. 


2.0  SUMMARY  AND  ASSESSMENT 


Several  lessons  were  learned  from  the  development  of  the  lU  lab  that  can  be  used  to  guide  and 
improve  the  ongoing  efforts  in  the  SA  lab.  The  main  lesson  that  can  be  taken  away  from  the 
development  of  the  lU  lab  is  how  difficult  it  is  to  develop  accurate  metrics  for  knowledge 
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discovery  and  information  extraction  tools.  Because  of  the  nature  of  this  technology  and  our 
inability  to  obtain  real-world  ground  truth  data,  we  have  struggled  to  identify  metrics  we  believe 
show  the  true  capabilities  of  many  of  the  tools  that  we  have  evaluated. 

Another  lessoned  learned  is  that  when  dealing  with  prototype  tools,  it  is  difficult  to  evaluate  tools 
that  often  break  easily  or  are  so  unstable  that  is  nearly  impossible  to  complete  experiments  on 
them.  It  is  also  very  difficult  to  modify  data  to  be  used  with  these  tools.  Academic  institutions 
and  companies  that  have  little  experience  working  with  intelligence  analysts  or  within  the  field 
of  intelligence  analysis  developed  many  of  the  tools.  As  a  result,  many  of  the  tools  suffer  these 
shortcomings: 

0  Too  difficult  (from  a  technical  stand-point)  to  be  operated  by  your  typical  intelligence 
analyst 

0  Require  an  enormous  amount  of  time  to  access  the  large  amounts  of  data  that  the 
analyst  must  examine 

These  issues  negate  their  intended  purpose  of  helping  the  analyst  maximize  their  efforts.  It  also 
makes  it  extremely  difficult  to  recommend  the  transition  of  these  tools  to  the  analyst  and  the 
organizations  that  rely  on  their  accurate  and  timely  products.  AFRL/IFED  has  made  contact 
with  several  agencies  regarding  the  potential  transition  of  tools  to  their  analysts.  At  this  point, 
we  have  had  a  difficult  time  transitioning  tools  because  the  tools  are  either  unstable  or  are 
difficult  for  the  analyst  to  use  (or  learn)  from  a  training  stand-point.  It  may  take  the  analyst  too 
long  to  learn  the  tool  and  time  is  of  the  essence  for  an  intelligence  analyst. 

Another  problem  is  the  conversion  of  intelligence  data  to  be  used  by  the  tools.  Often  these  tools 
require  information  in  a  specific  format.  This  conversion  of  information  can  be  a  lengthy 
process  that  often  only  the  tool  developers  can  perform  as  opposed  to  the  local  intelligence 
analyst.  However,  the  lU/SA  lab  has  made  great  strides  to  make  contact  with  the  Air  Force,  the 
DIA,  and  other  agencies  to  introduce  new  tools  to  intelligence  analysts  when  the  technology  is 
ready  for  transition.  The  SA  lab  will  continue  to  look  for  tools  that  will  help  our  intelligence 
analyst  complete  their  jobs  in  the  most  timely  and  accurate  manner  possible. 

Even  with  the  many  problems  faced  by  the  lU  lab,  the  effort  was  a  success.  Tools  that  had 
languished  on  shelves  were  examined  and  evaluated  to  the  greatest  extent  possible. 
Observations,  comments,  and  recommendations  were  used  to  improve  many  of  the  tools,  which 
led  to  their  ultimate  deployment  or  identification  of  future  enhancements.  Many  of  the  tools 
were  integrated  with  other  tools,  assisting  in  the  development  of  both  tools.  Deficiencies  found 
in  many  of  the  tools  have  led  to  the  development  of  other  tools,  expanding  the  scope  and  need 
for  additional  research  and  development  in  the  field  of  knowledge  discovery  and  information 
extraction. 
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